Ramsey Games Against a One-Armed Bandit

نویسندگان

  • Ehud Friedgut
  • Yoshiharu Kohayakawa
  • Vojtech Rödl
  • Andrzej Rucinski
  • Prasad Tetali
چکیده

We study the following one-person game against a random graph: the Player’s goal is to 2-colour a random sequence of edges e1, e2, . . . of a complete graph on n vertices, avoiding a monochromatic triangle for as long as possible. The game is over when a monochromatic triangle is created. The online version of the game requires that the Player should colour each edge when it comes, before looking at the next edge. While it is not hard to prove that the expected length of this game is about n, the proof of the upper bound suggests the following relaxation: instead of colouring online, the random graph is generated in only two rounds, and the Player colours the edges of the first round before the edges of the second round are generated. Given the size of the first round, how many edges can there be in the second round if the Player is to win with reasonable probability? In the extreme case, when the first round consists of a random graph with cn edges, where c is a positive constant, we show that the Player can win only if constantly many edges are generated in the second round. The analysis of the two-round version of the game is based on a delicate lemma concerning edge-coloured random graphs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Combinatorial Multi-Armed Bandit Problem and Its Application to Real-Time Strategy Games

Game tree search in games with large branching factors is a notoriously hard problem. In this paper, we address this problem with a new sampling strategy for Monte Carlo Tree Search (MCTS) algorithms, called Naı̈ve Sampling, based on a variant of the Multi-armed Bandit problem called the Combinatorial Multi-armed Bandit (CMAB) problem. We present a new MCTS algorithm based on Naı̈ve Sampling call...

متن کامل

A stochastic bandit algorithm for scratch games

Stochastic multi-armed bandit algorithms are used to solve the exploration and exploitation dilemma in sequential optimization problems. The algorithms based on upper confidence bounds offer strong theoretical guarantees, they are easy to implement and efficient in practice. We considers a new bandit setting, called “scratch-games”, where arm budgets are limited and reward are drawn without rep...

متن کامل

Playing in stochastic environment: from multi-armed bandits to two-player games

Given a zero-sum infinite game we examine the question if players have optimal memoryless deterministic strategies. It turns out that under some general conditions the problem for twoplayer games can be reduced to the same problem for one-player games which in turn can be reduced to a simpler related problem for multi-armed bandits. Digital Object Identifier 10.4230/LIPIcs.FSTTCS.2010.65

متن کامل

Combinatorial Multi-armed Bandits for Real-Time Strategy Games

Games with large branching factors pose a significant challenge for game tree search algorithms. In this paper, we address this problem with a sampling strategy for Monte Carlo Tree Search (MCTS) algorithms called näıve sampling, based on a variant of the Multiarmed Bandit problem called Combinatorial Multi-armed Bandits (CMAB). We analyze the theoretical properties of several variants of näıve...

متن کامل

Fast Seed-Learning Algorithms for Games

Recently, a methodology has been proposed for boosting the computational intelligence of randomized game-playing programs. We propose faster variants of these algorithms, namely rectangular algorithms (fully parallel) and bandit algorithms (faster in a sequential setup). We check the performance on several board games and card games. In addition, in the case of Go, we check the methodology when...

متن کامل

Non-trivial two-armed partial-monitoring games are bandits

We consider online learning in partial-monitoring games against an oblivious adversary. We show that when the number of actions available to the learner is two and the game is nontrivial then it is reducible to a bandit-like game and thus the minimax regret is Θ( √ T ).

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Combinatorics, Probability & Computing

دوره 12  شماره 

صفحات  -

تاریخ انتشار 2003